A Cepstral Domain Maximum Likelihood Beamformer for Speech Recognition

نویسندگان

  • Dominik Raub
  • John McDonough
  • Matthias Wölfel
چکیده

Recent work by Seltzer [1] indicates that classical approaches to beamforming, minimizing output power while enforcing a distortionless constraint, do not yield optimal results in terms of word error rate (WER) on speech recognition task. This problem can be traced back to the mismatch between the target criterion of classical adaptive beamformers, which is optimization of the signal to noise ratio, and the actual target criterion, which is the reduction of the recognizer’s WER. Following an approach by Seltzer [1] we therefore investigate the performance of an alternative error criterion, which attempts to optimize the beamformer weights, so as to improve the likelihoods along the recognizer’s Viterbi path for each utterance. This criterion matches the goal of lower WERs more closely and therefore leads to better recognition results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast speech adaptation in linear spectral domain for additive and convolutional noise

In this paper, we propose a transform-based adaptation technique for robust speech recognition in unknown environments. It uses maximum likelihood spectral transform (MLST) algorithm with additive and convolutional noise parameters. Previously many adaptation algorithms have been proposed in the cepstral domain. Though the cepstral domain may be appropriate for the speech recognition, it is dif...

متن کامل

Reducing the effects of linear channel distortion on continuous speech recognition

Linear channel compensation in speech recognition typically involves estimating an additive shift in the cepstral domain. This paper explores both Bayesian and maximum likelihood techniques to transform either the features or the model parameters. Experiments on the Macrophone corpus show error rate reductions over cepstral mean subtraction for short utterances.

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

A cepstral domain maximum likelihod beamformer for speech recognition

Recent work by Seltzer [1] indicates that classical approaches to beamforming, minimizing output power while enforcing a distortionless constraint, do not yield optimal results in terms of word error rate (WER) on speech recognition task. This problem can be traced back to the mismatch between the target criterion of classical adaptive beamformers, which is optimization of the signal to noise r...

متن کامل

Speech recognition in noisy environments using first-order vector Taylor series

Ž . In this paper, we generalize relations between clean and noisy speech signal using vector Taylor series VTS expansion Ž . for noise-robust speech recognition. We use it for both the noisy data compensation and hidden Markov model HMM parameter adaptation, and apply it for the cepstral domain directly, while Moreno used it to estimate the log-spectral parameters. Also, we develop a detailed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004